Multimodal intent recognition for natural human-robotic interaction

نویسنده

  • James Rossiter
چکیده

The research questions posed for this work were as follows: • Can speech recognition and techniques for topic spotting be used to identify spoken intent in unconstrained natural speech? • Can gesture recognition systems based on statistical speech recognition techniques be used to bridge the gap between physical movements and recognition of gestural intent? • How can speech and gesture be combined to identify the overall communicative intent of a participant with better accuracy than recognisers built for individual modalities? In order to answer these questions a corpus collection experiment for Human-Robotic Interaction was designed to record unconstrained natural speech and 3 dimensional motion data from 17 different participants. A speech recognition system was built based on the popular Hidden Markov Model Toolkit and a topic spotting algorithm based on usefulness measures was designed. These were combined to create a speech intent recognition system capable of identifying intent given natural unconstrained speech. A gesture intent recogniser was built using the Hidden Markov Model Toolkit to identify intent directly from 3D motion data. Both the speech and gesture intent recognition systems were evaluated separately. The output from both systems were then combined and this integrated intent recogniser was shown to perform better than each recogniser separately. Both linear and non-linear methods of multimodal intent fusion were evaluated and the same techniques were applied to the output from individual intent recognisers. In all cases the non-linear combination of intent gave the highest performance for all intent recognition systems. Combination of speech and gestural intent scores gave a maximum classification performance of 76.7% of intents correctly classified using a two layer Multi-Layer Perceptron for non-linear fusion with human transcribed speech input to the speech classifier. When compared to simply picking the highest scoring single modality intent, this represents an improvement of 177.9% over gestural intent classification, 67.5% over a human transcription of speech based speech intent classifier and 204.4% over an automatically recognised speech based speech intent classifier.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Regions of Interest from Multimodal Human-Robot Interactions

Learning new concepts, such as object models, from humanrobot interactions entails different recognition capabilities on a robotic platform. This work proposes a hierarchical approach to address the extra challenges from natural interaction scenarios by exploiting multimodal data. First, a speech-guided recognition of the type of interaction happening is presented. This first step facilitates t...

متن کامل

Multimodal Human-computer Interaction

While human-to-human communication takes advantage of an abundance of information and cues, human-computer interaction is limited to only a few input modalities (usually only keyboard and mouse) and provides little flexibility as to choice of communication modality. In this paper, we present an overview of a family of research projects we are undertaking at Carnegie Mellon and Karlsruhe Univers...

متن کامل

Real-Time Human-Robot Interaction for a Service Robot Based on 3D Human Activity Recognition and Human-like Decision Mechanism

This paper describes the development of a realtime Human-Robot Interaction (HRI) system for a service robot based on 3D human activity recognition and human-like decision mechanism. The Human-Robot Interactive (HRI) system, which allows one person to interact with a service robot using natural body language, collects sequences of 3D skeleton joints comprising rich human movement information abo...

متن کامل

Multimodal Emotion Recognition for Human-Computer Interaction: A Survey

Today, the computer and its applications has invaded our daily life, Ubiquitous Computing. The interaction between the users and computing devices is becoming similar to human-human interactions. The integration of emotion recognition in Human-Computer Interaction aims at making the interaction easier and smarter, natural interaction. This paper, presents a brief survey on current state of the ...

متن کامل

A Natural Human-Computer Interface for Controlling Wheeled Robotic Vehicles

Robots are used increasingly to execute dangerous tasks and military missions. Autonomous robots are the warriors of the future, executing missions without requiring continuous supervision. Multimodal interfaces are the interfaces of the future in which speech, gestures, gaze, and other modalities are combined to provide a natural way for humans to communicate with machines. In this thesis I pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011